Take-aways
(to be fair, should look into proportion of work force)
setwd("~/Desktop/tidytuesday/data")
aus<-read.csv("week4_australian_salary.csv")
require(ggplot2)
require(plotly) ## use to hover and see the job names
aus[grep("stat",aus$occupation),] ## looking for statistics
## X gender_rank occupation gender
## 1131 1131 907 Garage attendant; Service station attendant Female
## 1132 1132 979 Garage attendant; Service station attendant Male
## 1786 1786 170 Railway station manager Female
## 1787 1787 174 Railway station manager Male
## 1792 1792 250 Real estate agency manager Female
## 1793 1793 111 Real estate agency manager Male
## 1794 1794 305 Real estate agent Female
## 1795 1795 239 Real estate agent Male
## 1796 1796 538 Real estate property manager Female
## 1797 1797 210 Real estate property manager Male
## 1994 1994 385 Stock and station agent Female
## 1995 1995 457 Stock and station agent Male
## individuals average_taxable_income
## 1131 2434 31906
## 1132 2678 34126
## 1786 196 74737
## 1787 1220 97952
## 1792 2326 66271
## 1793 2437 110559
## 1794 6997 62056
## 1795 10983 88045
## 1796 18088 49080
## 1797 6708 92500
## 1994 108 57899
## 1995 1204 67675
aus[grep("math",aus$occupation),] ## nope
## [1] X gender_rank occupation
## [4] gender individuals average_taxable_income
## <0 rows> (or 0-length row.names)
scientist=aus[grep("scien",aus$occupation),] ## bingo
engineer=aus[grep("engineer",aus$occupation),]
Get things organized. Not particularly tidy, but bear with me.
scientistG=split(scientist,scientist$gender)
engineerG=split(engineer,engineer$gender)
names(scientistG[[1]])=paste("F",names(scientistG[[1]]),sep="")
names(scientistG[[2]])=paste("M",names(scientistG[[2]]),sep="")
names(engineerG[[1]])=paste("F",names(engineerG[[1]]),sep="")
names(engineerG[[2]])=paste("M",names(engineerG[[2]]),sep="")
scientistFull=cbind(scientistG[[1]],scientistG[[2]])
engineerFull=cbind(engineerG[[1]],engineerG[[2]])
The line is y=x. If there was gender parity, we would see points lying around this line. You can hover to see the job titles.
p <- ggplot(scientistFull, aes(x = Findividuals, y = Mindividuals, text =Moccupation)) +
geom_point() +geom_abline(intercept = 0, slope = 1)+xlab("number of individuals")+
ylab("average taxable income for males ($)")+ggtitle("Science Jobs")
p ## for static version on github
p <- ggplotly(p)
p
p <- ggplot(engineerFull, aes(x = Findividuals, y = Mindividuals, text =Moccupation)) +
geom_point() +geom_abline(intercept = 0, slope = 1)+xlab("number of individuals")+
ylab("average taxable income for males ($)")+ggtitle("Engineer Jobs")
p ## for static version on github
p <- ggplotly(p)
p
Again the line is y=x. If there was gender parity, we would see points lying around this line. You can hover to see the job titles.
p <- ggplot(scientistFull, aes(x = Faverage_taxable_income, y = Maverage_taxable_income, text =Moccupation)) +
geom_point() +geom_abline(intercept = 0, slope = 1)+xlab("average taxable income for females ($)")+
ylab("average taxable income for males ($)")+ggtitle("Science Jobs")
p ## for static version on github
p <- ggplotly(p)
p
p <- ggplot(engineerFull, aes(x = Faverage_taxable_income, y = Maverage_taxable_income, text =Moccupation)) +
geom_point() +geom_abline(intercept = 0, slope = 1)+xlab("average taxable income for females ($)")+
ylab("average taxable income for males ($)")+ggtitle("Engineer Jobs")
p ## for static version on github
p <- ggplotly(p)
p
lm(scientistG[[2]]$Maverage_taxable_income~scientistG[[1]]$Faverage_taxable_income)
##
## Call:
## lm(formula = scientistG[[2]]$Maverage_taxable_income ~ scientistG[[1]]$Faverage_taxable_income)
##
## Coefficients:
## (Intercept)
## -14063.862
## scientistG[[1]]$Faverage_taxable_income
## 1.521
lm(engineerG[[2]]$Maverage_taxable_income~engineerG[[1]]$Faverage_taxable_income)
##
## Call:
## lm(formula = engineerG[[2]]$Maverage_taxable_income ~ engineerG[[1]]$Faverage_taxable_income)
##
## Coefficients:
## (Intercept)
## 6543.508
## engineerG[[1]]$Faverage_taxable_income
## 1.261